Effectiveness of Compiler-Directed Prefetching on Data Mining Benchmarks
نویسندگان
چکیده
For today's increasingly power-constrained multicore systems, integrating simpler and more energy-e±cient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated e®ective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control °ow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the e®ectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often e®ectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become ine±cient in the presence of complex control °ow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more di±cult for the compiler to insert prefetch instructions, since optimizations that are e®ective for single-threaded execution may or may not be e®ective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that
منابع مشابه
The Interaction and Relative Effectiveness of Hardware and Software Data Prefetch
A major performance limiter in modern processors is the long latencies caused by data cache misses. Both compiler and hardware based prefetching schemes help hide these latencies and so improve performance. Compiler techniques infer memory access patterns through code analysis, and insert appropriate prefetch instructions. Hardware prefetching techniques work independently from the compiler by ...
متن کاملA Compiler-Assisted Data Prefetch Controller
Data-intensive applications often exhibit memory referencing patterns with little data reuse, resulting in poor cache utilization and run-times that can be dominated by memory delays. Data prefetching has been proposed as a means of hiding the memory access latencies of data referencing patterns that defeat caching strategies. Prefetching techniques that either use special cache logic to issue ...
متن کاملC-Miner: Mining Block Correlations in Storage Systems
Block correlations are common semantic patterns in storage systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough to...
متن کاملCompiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors
This document describes a set of new techniques for improving the eeciency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data ow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile...
متن کاملEecient Integration of Compiler-directed Cache Coherence and Data Prefetching Compiler-directed Cache Coherence and Data Prefetching
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Circuits, Systems, and Computers
دوره 21 شماره
صفحات -
تاریخ انتشار 2012